An Empirical Study of Imputation Techniques for Software Data Sets

نویسندگان

  • Sumanth Yenduri
  • Karki
  • Kraft
  • Jones
  • Ward
  • Gu
چکیده

ii ACKNOWLEDGEMENTS I would like to take this opportunity to thank my Mother who has been most supportive and has been my source of inspiration whenever I needed direction. I would like to thank my advisor, Dr. Iyengar for his invaluable guidance throughout my Ph.D. studies at LSU. I learnt the importance of honesty, integrity, creativity, critical-thinking, and hard work as a researcher through him. The work proposed here is not separable from his insightful and nurturing ideas shared with me. It is with my deepest gratitude that I acknowledge him for all his help and cooperation in providing me his guidance and moral support. being on my Ph.D. committee and giving me insightful comments regarding my research. I would also like to thank Dr. Perkins who guided me through my Master's Project. Finally, I would like to thank the entire faculty, the staff, and friends in the Computer Science Dept., at LSU, who have made my stay a memorable one.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Imputation Techniques for Efficient Prediction of Software Fault Proneness in Classes

Missing data is a persistent problem in almost all areas of empirical research. The missing data must be treated very carefully, as data plays a fundamental role in every analysis. Improper treatment can distort the analysis or generate biased results. In this paper, we compare and contrast various imputation techniques on missing data sets and make an empirical evaluation of these methods so a...

متن کامل

An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods

Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...

متن کامل

Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods

ÐMissing data are often encountered in data sets used to construct effort prediction models. Thus far, the common practice has been to ignore observations with missing data. This may result in biased prediction models. In this paper, we evaluate four missing data techniques (MDTs) in the context of software cost modeling: listwise deletion (LD), mean imputation (MI), similar response pattern im...

متن کامل

Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...

متن کامل

A Short Note on Using Multiple Imputation Techniques for Very Small Data Sets

This short note describes a simple experiment to investigate the value of using multiple imputation (MI) methods [2, 3]. We are particularly interested in whether a simple bootstrap based on a k-nearest neighbour (kNN) method can help address the problem of missing values in two very small, but typical, software project data sets. This is an important question because, unfortunately, many real-...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005